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We study the adaptation dynamics of an initially maladapted population evolving via the elemen- 
tary processes of mutation and selection. The evolution occurs on rugged fitness landscapes which 
are defined on the multi-dimensional genotypic space and have many local peaks separated by low 
fitness valleys. We mainly focus on the Eigen's model that describes the deterministic dynamics of 
an infinite number of self-replicating molecules. In the stationary state, for small mutation rates 
such a population forms a quasispecies which consists of the fittest genotype and its closely related 
mutants. The quasispecies dynamics on rugged fitness landscape follow a punctuated (or step-like) 
pattern in which a population jumps from a low fitness peak to a higher one, stays there for a 
considerable time before shifting the peak again and eventually reaches the global maximum of 
the fitness landscape. We calculate exactly several properties of this dynamical process within a 
simplified version of the quasispecies model. 

PACS numbers: 



I. INTRODUCTION 

Consider a maladapted population such as a bacterial colony in a glucose-limited environment, or a viral population 
in a vaccinated animal cell. In such harsh environments, the less fit members of the population are likely to perish 
and only the highly fit ones can survive to the next generation. In this manner, the fitness of the population increases 
with time and the initially maladapted population evolves to a well-adapted state. In the last century, there has been 
a concerted effort to put this verbal theory of Darwin [l[ on a solid quantitative footing by performing long-term 
experiments on microbial populations and studying theoretical models of biological evolution. 

One of the questions in evolutionary biology concerns the mode of evolution. In the experiments on microbes, it 
is found that the fitness of the maladapted population can increase with time in either a smooth continuous manner 
or sudden jumps @. The latter mode is consistent with evolution on a fitness landscape defined on genotypic 
space with many local peaks separated by fitness valleys. On such a rugged fitness landscape, a low fitness population 
initially climbs a fitness peak until it encounters a local peak where it gets trapped since a better peak lies some 
mutational distance away. In a population of realistic size, it takes a finite time for an adaptive mutation to arise 
and the fitness stays constant during this time (stasis). Once some beneficial mutants become available, the fitness 
increases quickly as the population moves to a higher peak where it can again get stuck. Such dynamics alternating 
between stasis and rapid changes in fitness go on until the population reaches the global maximum. 

This punctuated behavior of fitness is also seen in deterministic models that assume infinite population size. An 
example of such a step-like pattern for average fitness is shown in Fig. [T] A neat and unambiguous way of defining 
a step is by considering the fitness of the most populated genotype also shown in Fig. [TJ Since large but finite 
populations evolve deterministically at short times it is worthwhile to study the punctuated evolution in models 
with infinite number of individuals. In this article, we will briefly describe some exact results concerning the dynamics 
of an infinitely large population on rugged fitness landscapes [a, @ . We will find that the mechanism producing the 
step-like behavior is not due to "valley crossing" as in finite populations but when a fitter population "overtakes" the 
less fit one as described in the subsequent sections. 

II. QUASISPECIES MODEL AND ITS STEADY STATE 

We consider an infinitely large population reproducing asexually via the elementary processes of selection and 
mutation. Each individual in the population carries a binary string a = {oi, ctl} of length L where (7^ = or 1. 
The 2^ sequences are arranged on the multi-dimensional Hamming space. The information about the environment 
is encoded in fitness landscape defined as a map from the sequence space into the real numbers and is generated 
by associating a non-negative real number W{cr) to each sequence a. Fitness landscapes can be simple possessing 
some symmetry properties such as permutation invariance, or complex devoid of any such symmetries (7|, |8|. Fitness 
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FIG. 1: (Color online) Punctuated change in the average population fitness (dotted line) and the fitness of the most populated 
genotype (solid line) for an infinite population evolving on a maximally rugged fitness landscape. Here genome length L = 15 
and mutation probability jj. — 10~*. 



functions with single peak are an example of simple fitness landscapes while rugged landscapes with many hills and 
valleys belong to the latter class. 

The average population fraction X{a,t) with sequence a at time t follows mutation-selection dynamics described 
by the following discrete time equation 0, Q 

E^l,,,,^^ (1) 

The last two factors in the numerator of the above equation give the population fraction when a sequence a' copies itself 
with replication probability W{a') since fitness is defined as the average number of offspring produced per generation. 
After the reproduction process, point mutations are introduced independently at each locus of the sequence a' with 
probability fi per generation. Thus, a sequence a is obtained via mutations in a' with probability 



d{<7A 



where the Hamming distance c?(cr, a') is the number of point mutations in which the sequences a and a' differ. The 
denominator of ([1]) is the average fitness of the population at time t which ensures that the density X{(j, t) is conserved. 

The stationary state of the quasispecies equation ([1]) has been studied extensively in the last two decades for various 
fitness landscapes. These numerical and analytical studies have shown that for most landscapes, there exists a critical 
mutation rate /Xc below which the population forms a quasispecies consisting of fittest genotype and its closely related 
mutants while above it, the population delocalises over the whole sequence space. This error threshold phenomenon 
can be easily demonstrated for a single peak fitness landscape defined as 

W{a) = Wo5„^ao + (1 - S,,ao) , W^o > 1 (3) 

where ctq is the fittest sequence. In the limit /i — > 0, L — )■ c» keeping U — fiL fixed, the frequency of the fittest 
sequence in the steady state of ([T]) is given by 

which is an acceptable solution provided U < Uc — In Wq. For U > Uc, selection is unable to counter the delocalising 
effects of mutation and the population can not be maintained at the fitness peak. For a discussion of error threshold 
phenomenon on other fitness landscapes and generalisations of the basic quasispecies equation ([1]) , we refer the reader 
to i. 
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III. QUASISPECIES DYNAMICS ON RUGGED FITNESS LANDSCAPES 

We now turn our attention to the dynamical evolution of X{a,t) on rugged fitness landscapes. We consider 
maximally rugged fitness landscapes for which the fitness W{a) is a random variable chosen independently from a 
common distribution. It is useful to introduce the unnormalised population defined as 

t-i 

T=0 a' 

in terms of which the nonlinear evolution ([1]) reduces to the following linear iteration 

Z{a,t + l)=Y,Pa^a'W{<j')Z{a',t) . (6) 

cr 

Since at the beginning of the adaptation process the population finds itself at a low fitness genotype, we start with 
the initial condition X{a,0) = Z{a,0) — 6„„{o) where a^^^ is a randomly chosen sequence. For mutation probability 
fi 0, after one iteration we have 

. (7) 

Thus in an infinite population model, each sequence gets populated in one generation obviating the need for "valley 
crossing" which is required for finite populations. Although an exact solution of ^ for t > 1 is not available, it 
is possible to obtain several asymptotically exact results concerning the most populated genotype using a simplified 
version of the quasispecies dynamics. Numerical simulations of [lO| showed that dynamical properties involving the 
most populated genotype are well described by a simplified model which approximates the population Z{a,t) in ([6]) 
by 

Z ((7,0 , t>l . (8) 

This model ignores mutations once each sequence has been populated and allows the population at each sequence 
to grow with its own fitness. However, a recent perturbative analysis in the small parameter fj, shows that this 
approximation holds for highly fit sequences and at short times ^ . 

Writing W{a) = e^^"^^ and rescaling time by | ln/z| in ([8|, we find that the logarithmic population E{a,t) obeys the 
following linear equation: 

E{(j,t)^~d{(j,a'^°^)+F{a)t . (9) 

The linear evolution of the (logarithmic) population of 2^ sequences for L = 4 is shown in Fig. [5^. Since the initial 
population fraction given by ([7]) is same for all the sequences at constant Hamming distance (i(fT, cr*^"^) from cr(°\ 
(^) lines are seen to emanate from the same intercept. However as the genotype with the largest slope (fitness) at 
constant intercept has the potential to become the most populated sequence, we arrive at the model in Fig. [2)d in 
which L + 1 genotypes are retained, each of whose fitness F{k), fc = 0, L is an independent but non-identically 
distributed variable [1, . 

In a sequence {F(fc)} of random variables, a record is said to occur at m if F{m) > F{k) for all k < m. In 
Fig. [^h, the sequences at distance k = 0,2 and 3 from the initial sequence are records but the sequence at fc = 2 does 
not become a most populated genotype. In order to qualify as a jump, it is not sufficient to have a record fitness; 
the population should also be able to overtake the current winner in minimum time. Due to the overtaking time 
minimization constraint, the records and jumps have different statistical properties which we describe briefly in the 
next subsections. 



A. Statistics of records 



Although the record statistics for independent and identically distributed (i.i.d.) random variables is well studied, 
much less is known when the variables are not i.i.d. [ll|. Here we have a situation in which F{k) is a maximum of 
ttfe = (^) i.i.d. random variables. However, since the fcth record fitness F(k) is the largest amongst X]j=o "^i 
variables and there are ak ways of choosing it, the probability Pk that the fcth fitness is a record is given by [sl. [l^ 

A. = -#^-|^,fc<V2. (10) 
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FIG. 2: (a) Evolutionary trajectories E{a, t) defined by ((9]) for L = 4. Tlie bold lines have the largest fitness amongst the 
(^) fitnesses at distance k from the origin, (b) Evolutionary race: The sequence at distance 3 is the most populated sequence 
(winner) while the one at distance 2 is a record (contender). 



The meaning of the above distribution is intuitively clear: as it is easier to break records in the beginning, the 
probability to find a record is near unity for k <ti L and it vanishes beyond L/2 because the global maximum typically 
occurs at this distance. The average number TZ of records can be obtained by simply integrating P{k) over k to yield 
7?. « (1 — ln2)L. It is also possible to find the typical spacing A(j) between the jth and (j + l)th record where we 
have labeled the last record (i.e. global maximurn) as j = 1. A straightforward calculation shows that the typical 
inter-record spacing falls as a power law given by [SI] 

The above expression indicates that the spacing between the last few records (i.e j ~ C(l)) is of order a/L, while most 
of the records are crowded at the beginning which is consistent with the behavior of the record occurrence probability 



B. Statistics of jumps 

The calculation of jump statistics fH| is more involved than that of records because a jump event requires a mini- 
mization of the overtaking time. This constraint imposes a condition on the fitnesses of the squences that can possibly 
overtake the current leader in a time interval between t and t + dt. The sequence at distance k' can overtake the fcth 
one (with fitness F) at time t if the fitness F{k') = {E{k, t) + k')/t and at time t + dt, dt/t — > if 

E(k,t + dt) + k' k'-k k'-k , 

Then the total collision rate M^/ ^(F, t) with which the fcth sequence is overtaken by the fc'th one is given as 

Wu'.u{F,t) « ^ p,, (f+^^\ ,k'>k (12) 



t2 V t 

where Pk{F) is the distribution of the maximum of ak i.i.d. random variables distributed according to p{F) with 
support over the interval [Fmin, i^max]- Using this collision rate, we can write the probability Vk',kit) that the sequence 
at distance k' overtakes the fcth one at time t as 

Vk'Mt) = / dF Wk'MF,t) Pk{F,t) (13) 
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where the probabiHty Pk {F, t) that the fcth sequence has the largest population at time t is given by 

PkiF,t) ^ PkiF) J] / dF'p.iF') . (14) 



Note that unlike the records, the jump properties depend on the underlying distribution of the random variables. 
Below we present some results when the distribution p{F) — e"^ . 

Integrating (fT3|) over time, the probability distribution Pk' ,k that fcth sequence is overtaken by fc'th sequence is 
obtained, 



/ ' 1 


^fc'-fc\ 


/ 7rfc(i - fc) ' 


^ 2fc ) 



P.,.«y^^:^ e-^,fc<fc'<L/2. (15) 

This form of the distribution implies that the overtaking sequence fc' is located within 0{^/k) distance of the overtaken 
sequence fc. Thus the typical spacing between successive jumps for large fc is roughly constant and goes as ^/Tj unlike 
in the case of records discussed in the last subsection. The jump distribution P^ for a jump to occur at distance fc is 
obtained by integrating over fc' and we have Q 



^^--\/^fc(rrfc)^-(i-0 (16) 

where 6h is the Heaviside step function which takes care of the fact that the record distribution (fTO|) vanishes at 
distance L/2. Instead of integrating over time, by summing over the space variables fc, fc' in (|13p . the probability P{t) 
that a jump occurs at time t can be obtained and is given by Q 

The heavy tail distribution P{t) ^ can be understood using a simple argument [lo| and implies that mean 
overtaking time is infinite. Finally, by either summing Pk over fc or integrating P{t) over time, the total number of 
jumps Cf are found to be V L'k/2 which is much smaller than the number of records TZ. 



IV. SUMMARY 



In this article, we discussed the steady state and the dynamics of the quasispecies model which describes a self- 
replicating population evolving under mutation-selection dynamics. On rugged fitness landscapes, the population 
fitness increases in a punctuated fashion and we described several exact results concerning this mode of evolution. 
Our recent simulations indicate that the law in (fT7|) for the deterministic populations also holds for finite 
stochastically evolving populations @. At present, we do not have an analytical understanding of the latter result 
but it should be possible to test this law in long-term experiments such as those of Q on E. Coli. 
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